A New Statistical Approach to Personal Name Extraction
نویسندگان
چکیده
We propose a new statistical approach to extracting personal names from a corpus. One of the key points of our approach is that it can both automatically learn the characteristics of personal names from a large training corpus and make good use of human empirical knowledge (e.g., Context Free Grammar). Furthermore, our approach also assigns confidence measures to the extracted personal names, compared with traditional simple true/false determination. Another main contribution of this work is that we have applied the personal name extraction technology into a real application, which is a Chinese inputting system and have achieved an approximately 7% error rate reduction for all characters and 30% error rate reduction for personal names.
منابع مشابه
The Place-Name as an Intangible Place of Memory (A Holistic Approach in Reading the Place-Names through a Comparative-Analytical Study on the Character of Name and Place)
Understanding architectural heritage and their various aspects have always been a subject of focus for the international conservation communities. Within the recent decades, eventhough the place-names are part of the living history as well as cultural heritage, they have still constantly been facing quick precipitant changes. As such, in the Conservation literature, most studies have skipped ad...
متن کاملPolyUHK: A Robust Information Extraction System for Web Personal Names
Personal information extraction is an important component of advanced information retrieval. There are two problems needed to be solved in this practical task: personal name ambiguity and extraction of personal information for a specific person. For personal name ambiguity, which is a very common phenomenon in the fast growing Web resource, we propose a robust system which extracts features wit...
متن کاملA Statistical Approach to Classify Nationality of Name
Name entities (NEs), especially personal names, are very important components in interpreting some kinds of text documents e.g. news. To extract personal names efficiently, statistical language models are required to denote characteristics of personal names. Among these characteristics, nationality of a name is a useful source for interpreting the text document. Automatically inferencing nation...
متن کاملPRIS at Chinese Language Processing
The more Chinese language materials come out, the more we have to focus on the “same personal name” problem. In our personal name disambiguation system, the hierarchical agglomerative clustering is applied, and named entity is used as feature for document similarity calculation. We propose a two-stage strategy in which the first stage involves word segmentation and named entity recognition (NER...
متن کاملA Fast Localization and Feature Extraction Method Based on Wavelet Transform in Iris Recognition
With an increasing emphasis on security, automated personal identification based on biometrics has been receiving extensive attention. Iris recognition, as an emerging biometric recognition approach, is becoming a very active topic in both research and practical applications. In general, a typical iris recognition system includes iris imaging, iris liveness detection, and recognition. This rese...
متن کامل